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Abstract — In this paper, first we present a new explanation 
for the relation between logical circuits and artificial neural 
networks, logical circuits and fuzzy logic, and artificial neural 
networks and fuzzy inference systems. Then, based on these 
results, we propose a new neuro-fuzzy computing system which 
can effectively be implemented on the memristor-crossbar struc- 
ture. One important feature of the proposed system is that its 
hardware can directly be trained using the Hebbian learning rule 
and without the need to any optimization. The system also has 
a very good capability to deal with huge number of input-out 
training data without facing problems like overtraining. 

Index Terms — Logical circuit, Fuzzy logic, Neural network, 
Neuro-fuzzy computing system, Memristive device, Memristor 
Crossbar, Hebbian Learning Rule. 



I. Introduction 

During past years, lots of efforts have been made to ap- 
proach to the computing power of human brain. These efforts 
roughly can be categorized into several different areas such 
as Artificial Neural Networks (ANNs), Fuzzy Logic, etc. By 
reviewing these works we can simply recognize that most 
of them have only concentrated on the software and we can 
rarely find a good sample for hardware implementation of an 
intelligent system. In addition, lots of the suggested structures 
or methods do not have biological support. By considering 
the number of neurons in the human brain and the complexity 
of connections between them, the importance of having an 
efficient hardware with the ability of expanding into that scale 
becomes more and more clear. According to the nature of 
computation and memory in brain, now it is a well-accepted 
fact that this hardware should be in analog form since in 
this case it can work much faster than conventional digital 
circuits. However, heretofore, there was a big obstacle in front 
of reaching this goal. In fact, there was no simple passive 
element that can be used for storing and manipulation of data 
like synaptic weights. Note that although analog values can 
be stored in capacitors as voltage or charge, the stored values 
cannot easily be read and used in computations without being 
altered. In addition, according to the leakage problem the data 
stored in capacitors will vary in time. As a result, most of the 
analog hardwares proposed so far are somehow inefficient and 
area consuming designs (see, for example Q), J2), 0). 
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In 2008 and after the physical realization of first memristive 
device or memristor [4| (used interchangeably in the rest of 
paper) which was also predicted by Leon Chua in 1971 0, 
the field of brain emulation has been revived. This is mainly 
because of the existence of some similarities between the 
physical behavior of memristor and synapses in brain j6|, 
IfTl . Memristor is a passive device whose properties such as 
resistance (known as memristance) or conductance (known as 
memductance) can be changed by applying a suitable voltage 
or current to it. Therefore, analog values such as synaptic 
weights can be stored in this device by tuning its memris- 
tance. Fortunately, unlike capacitor, memristor can retain its 
memristance for a long period of time in the absence of the 
voltage or current applied to it [8 1. Moreover, since memristor 
simply acts as a time varying resistor it can easily be applied 
in many classical circuit designs. 

Aforementioned properties of memristor motivated some 
researchers to develop new brain-like computing architectures 
and methods where the main focus was on the simplicity of 
the resulted memristor-based hardware. The conducted studies 
on this subject can be divided into two main categories. The 
first category belongs to the works trying to implement fuzzy 
inference methods by using memristive hardwares (9), iflOll . 
iTTll . For example, in iflOl we showed that fuzzy relations (de- 
scribing the relation between input and output fuzzy concepts 
or variables in an imprecise manner) can efficiently be formed 
on memristor crossbar structures by using Hebbian learning 
method. Moreover, we also showed that the system constructed 
by concatenation of these basic units can perform fuzzy 
computations. The second category consists of the studies 
concentrated on hardware implementation of spiking neural 
networks and their learning methods such as Spike Timing- 
Dependent Plasticity (STDP) lfl2l using memristor crossbar 
structures fn\, HD, Q3), EU, E). In spite of their popular- 
ity, these networks suffer from some disadvantages. Firstly, 
there is no guarantee or proof for the convergence of the 
methods used for their training. Secondly, no perfect method 
has been proposed so far for training multi-layer spiking 
neural networks. This is why we do not see applications like 
function approximation (which requires multi-layer networks) 
to be implemented by these networks, although they exhibit 
excellent performance in some other applications like data 
classification [16|. Thirdly, spiking neurons have parameters 
(like threshold value of neurons) to be set. Finally, in these 
networks connection weights can be either positive or negative, 
which is a disadvantage from the hardware-implementation 
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point of view. 

In this paper, we propose a new computing system which 
leads to a simple hardware implementation and can remove 
some of the aforementioned difficulties. For this purpose, 
we start from logical circuits as the simplest multi-layer 
networks and reveal some of the similarities and dissimilarities 
between them and ANNs. For example, it will be demonstrated 
that logical circuits can be considered as networks whose 
connections can partly be tuned by using Hebbian learning 
rule [ 1 8 1 . This provides us with some ideas for improvement 
of the performance of ANNs. Then, we show that even without 
changing the structure of conventional ANNs, their working 
procedure can be explained based on fuzzy concepts. It means 
that fuzzy inference systems have biological support. Combi- 
nation of these findings leads us to a new multi-layer neuro- 
fuzzy computing system with very interesting and important 
properties as summarized below. First of all, the proposed 
system can be trained without using any optimization methods. 
Second, it accepts inputs in fuzzy format and generates fuzzy 
outputs. Third, all connection weights in the proposed method 
are non-negative. Fourth, neurons of the network do not have 
any parameter to be tuned. Fifth, in the proposed structure 
computing and memory units are assimilated with each other 
like what we see in human brain. Finally, it will be shown 
in the rest of the paper, the most important advantage of our 
method is that it can be simply implemented by using mem- 
ristor crossbar structures. It is worth to mention that, roughly 
speaking, complexity of the hardware needed to implement the 
proposed neuro-fuzzy computing system is almost the same 
as the hardware needed to implement a typical spiking neural 
network. 

The rest of this paper is organized as follows. In Section ITT1 
we review some similarities and dissimilarities between logical 
circuits and neural networks to gain some insights about 
how we can improve learning algorithms of neural networks. 
Then, by extending digital logic in a way that it can work 
with continuous variables we reach to fuzzy logic. Finally, 
in the same section we show that the working procedure of 
neural networks can be explained based on fuzzy concepts 
which means that fuzzy inference systems can have biological 
support like neural networks. In Section [HI] we develop our 
own neuro-fuzzy computing system and its associated learning 
method. Hardware implementation of the proposed method 
based on memristor crossbar structures is described in Section 
II VI Section [V] is devoted to the presentation of simulation 
results before conclusion in Section |VI] 

II. The relations between digital logic, fuzzy 

LOGIC AND NEURAL NETWORK 

A. Similarities and dissimilarities between logical circuits and 
artificial neural networks 

Logical circuits can be considered as the simplest form of 
multi-layer networks. To show this, first note that any binary 
function can be written in the standard form of sum of products 
(min-terms) JT9). For this purpose, one can simply add (more 
precisely, OR) those min-terms that activate the function under 
consideration. For example, Figure Q] shows the structure used 
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Fig. 1 . Typical logical circuit implementing two logical function in a standard 
form of sum of products. 



to construct the sample binary functions F\ and F2. In this 
figure, x, y, and z are the binary input variables and each 
logical gate acts only on the signals entered to it from the 
cross-points denoted by black circles. 

The simple logical structure shown in Fig. Q] acts very 
similar to the conventional ANNs. In fact, such a structure 
can be considered as a two-layer network. The first part of 
this structure, which consists of input and hidden layers (i.e., 
the layer consists of AND gates) is used to create the min- 
terms and the second part, which consists of hidden and output 
layers (i.e., the layer consists of OR gates) is used to add the 
products and generate the final outputs. It is well-known that 
any binary function can be constructed by using such a three- 
layer network. Clearly, similar to conventional ANNs |20l . in 
the structure shown in Fig. Q] increasing the number of layers 
will not enhance the accuracy of the resulted binary function. 

In order to gain more insight about the similarities between 
ANNs and logical circuits, we can describe the role of the 
first two layers of the circuit shown in Fig. Q] in another 
manner. As it can be observed, each input of this circuit is 
multiplied in a specific weight, which is equal to either 1 
or 0, and then it is entered to the hidden layer. Denote the 
weight matrix multiplied to the variables of input layer to 
form the variables of hidden layer as W]° gic . Clearly, each 
row of this matrix, which is stored on cross-points of the 
first crossbar, corresponds to a certain combination of input 
variables. More precisely, the rows of ~W° S1C determine the 
min-terms used to create the functions at the system output, 
which are independent from the definition of output functions. 
Since n input variables can generate, in general, 2" min- 
terms, number of the rows of ~W° S1C is always smaller than 
or equal to 2™. However, as a general observation, digital 
functions usually do not need all of these 2 n min-terms to 
be constructed. 

According to the above discussions, it seems that when the 
circuit shown in Fig. Q] is subjected to a certain binary input 
string, internal product of input variables and the pattern stored 
at each row of the crossbar (which corresponds to a specific 
min-term) is performed to determine the similarity between 
these two binary strings. Then each AND gate acts somehow 
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as a neuron with hard-thresholding activation function (in 
contrast to OR gates which are soft-thresholding operators): 
Only the output of the AND gate (neuron) located on the 
row with complete agreement with input variables is set to 
1 and the output of other AND gates (neurons) is set to 0. 
Note that although each AND gate here acts as a thresholding 
operator, unlike the activating function of neurons in ANNs, 
its thresholding level is determined beforehand and cannot be 
changed in practice. For example, when we have two input 
variables and one 3-input AND gate, we should set one of its 
input terminals equal to logic 1 to make sure that the activation 
of other two inputs can pass the threshold of the gate. Here, 
it is worth to mention that by recognizing the AND gate 
(or equivalently the min-term) with activated output in Fig. 
Q] one can determine the activated inputs and consequently, 
specify the concepts and events occurred simultaneously. In 
other words, it can be said that the activation of each min- 
term indicates the simultaneous occurrence of certain concepts 
(represented by input terminals). As it will be shown later, this 
method of determining the simultaneous happening of different 
events is of essential importance in designing our proposed 
neuro-fuzzy computing system. 

In the second part of the circuit shown in Fig. Q] the 
min-terms generated in the first part are combined together 
to produce the final outputs of the system. But, unlike the 
first part, the weights connecting the AND gates of hidden 
layer to the OR gates of output layer are not predictable 
and fully dependent on the definition of binary functions. 
However, interesting point is that these weights can be adjusted 
following the Hebbian learning rule ifTHIl used in ANNs. More 
precisely, it can be easily verified that when for a certain 
input the output of system is equal to logic 1, output of the 
activated AND gate (neuron) is connected to the activated 
output. It means that by the simultaneous applying the input 
and output data to input and output layers respectively, and 
using the Hebbian learning rule, the connections of the second 
part of the logical circuit can be constructed automatically. In 
addition, note that similar to most of the ANNs, thresholding 
strength of output neurons (i.e. OR gates) is weaker than the 
thresholding strength of neurons of the hidden layer (i.e. AND 
gates). 

Although multi-layer logical circuits act very similar to 
ANNs, they have considerable advantages that we are trying 
to use them in our proposed computing system. Firstly, in 
logical circuits all input and output signals, as well as the 
weights stored in cross-points, are non-negative in nature. 
Secondly and more importantly, unlike ANNs, any binary 
function can be realized by training only the weights stored in 
the second crossbar of the two-layer logical circuit shown in 
Fig. Q] without using any optimization methods (recall that the 
pattern stored in the first crossbar is independent of the special 
binary functions aimed to be constructed, and moreover, this 
pattern can be determined only by knowing the number of 
input variables). Thirdly, assuming that n%i and V stand for 
the zth min-term and the OR operator respectively, we have 
TOj V m,i — rrii. It yields that repeated min-terms (i.e. putting 
emphasis on the simultaneous occurrence of several concepts 
or inputs) in the definition of a binary function (which is 



expressed in the form of sum of products) has no effect on the 
input-output relation of the resulted structure. In other words, 
assuming that the weights stored in the second crossbar are 
adjusted by using the Hebbian learning method, subjecting the 
system to repeated input-output data will not change anything 
in the system. Clearly, it is an important capability that ANNs 
do not have. Fourthly, any binary function can be expressed 
in the form of sum of products. This provides us with an easy 
way to explain and understand the role of binary functions 
based on the concepts occurring simultaneously since it is 
similar to the logic used by the brain of humankind. In other 
words, one can easily discover the task of a binary function 
expressed in the form of sum of products, while it is quite 
difficult to get such a knowledge by decoding the synaptic 
weights of the given ANN. 

Finally, it should be noted that although using the comple- 
ment of a binary variable provides us with no more informa- 
tion, it is still a common practice to use both a binary variable 
and its complement in logical circuit design. In fact, it seems 
that in the logical circuit design any distinct value of the input 
variable is considered as an individual or independent concept, 
which is probably because of the fact that our brain prefers to 
work with two contradicting concepts than a non-contradicting 
one. This is completely in contrast with ANNs in which we 
usually use only one input for each continuous input variable. 
In other words, similar to fuzzy logic |10|, here it seems that 
we are considering one input terminal per each distinct value 
of input variable and the input that we apply to any of these 
terminals somehow is the representative of our confidence 
degree about the occurrence of its corresponding concept. For 
this reason, in the following discussions one input terminal is 
considered for each distinct value the input or output variable 
can take. 

B. The relation between logical circuits and fuzzy logic 

One main question that arises at this point is: "Can we 
extend the multi-layer logical circuit shown in Fig. Q] such that 
it can work with non-binary input variables?". Fortunately, the 
answer of this question is positive thanks to the fuzzy logic. 
In fact, the method used to express a binary function in the 
form of sum of products is very similar to the method of 
constructing a continuous function based on fuzzy rules. In 
the following we discuss on this subject with more details. 

First, consider a multi-input-single-output fuzzy inference 
system whose input-output relation can be considered as the 
mathematical map X ->• Y, where X C R™ and Y C M. 
Moreover, assume that the output of this fuzzy inference 
system is obtained by aggregation of the output of N fuzzy 
rules in the form of |21): 

: IF xi is A\ AND x 2 is A% AND ... x n is A k n 

THENyisS fc , k = l,...,N, (1) 

which, for better explanation of the similarities between fuzzy 
logic and logical circuits, can be rewritten as: 

R {k) : IF x is THEN y is B k . (2) 



4 



where Xi (i — 1,2, ...,n) and y are input and output 
variables, respectively, x = [xi,x 2 , ■ ■ ■ ,x n ] £ X, y E Y, 
{Ai, A\, ■ ■ ■ , A^} are fuzzy sets with membership functions 
/i A k(xi) (i = 1,2, ...,n; k = 1,2, . . . , N) defined on the 
universal set of input variables, B k (k = 1,2, . . . , N) are 
fuzzy sets defined on the universal set of the output variable, 
and S„ is a n-dimensional fuzzy set with the following multi- 
input membership function: 

^k(xi,x 2 , ...,x n )=t (n A k(xi),n A k(x 2 ), ■ ■ .,/J, A k(x n U , 

(3) 

where t is a i-norm operator. Since the fuzzy propositions in 
the antecedent part of ([TJi are combined using the fuzzy AND 
operator, we call such rules AND-type fuzzy rules. According 
to the properties of the t-norm operator, the fuzzy proposition 
in the consequent part of an AND-type fuzzy rule is true (in 
fuzzy sense) if all of the fuzzy propositions in the antecedent 
part of that rule are true (again in fuzzy sense). Similarly, 
combination of the fuzzy propositions of the antecedent part 
with fuzzy OR leads to an OR-type fuzzy rule. Similar to 
logical circuits, AND-type fuzzy rules are often preferred since 
our brains find them more reasonable. Here, we can see the 
structural and behavioral similarities between fuzzy logic and 
logical circuits. For example, antecedent part of ([TJi plays the 
role of the creation of min-terms in the first part of the circuit 
shown in Fig.Q]but with the difference that here A^s are fuzzy 
numbers (note that in logical circuits A^s were crisp numbers 
chosen from the set {0, 1}). In addition, aggregation of the 
output of N fuzzy rules by using a s-norm operator is similar 
to the function of the second part of the logical circuit shown 
in Fig. 03 Finally, note that in fuzzy logic all inputs, outputs 
and coefficients are non-negative as well. 

Now, let us look at the working procedure of each fuzzy 
rule in another way. According to <(2J and (O we interpret 
the antecedent part of each fuzzy rule as a subspace of 
R™ (denoted as S^) with non-crisp (fuzzy) borders. In fact, 
x* e l n belongs to the region defined by antecedent part of 
(O if p-gk (x*) is a big number (i.e., it is close to unity), and 
vice versa. More precisely, it can be said that any x* <E K™ 
belongs to the subspace Sj"j with the confidence degree of 
^i S k(x*) (for k = 1,2, . . . , N). Clearly, lying the point 
corresponds to a crisp input data in any of these subspaces 
indicates the simultaneous happening of certain fuzzy concepts 
defined by these subspaces (or equivalently, the corresponding 
A„s). For this reason, we call the subspace specified by 
antecedent part of each fuzzy rule a fuzzy min-term (in the 
special case when A\ is a fuzzy set with singleton membership 
function from the support set of {0, 1}, each fuzzy min-term 
is reduced to a logical min-term denoted as a single point in 
the space of input variables). Considering the fact that each 
of the fuzzy sets used in antecedent part of fuzzy rules can 
have a different membership function and support set, infinite 
number of unique fuzzy min-terms can be defined on the n- 
dimensional space of input variables. But, fortunately, in most 
of the real-world applications the input data are accumulated in 
certain parts of the space of input variables, and consequently, 
it is sufficient to define the limited number of fuzzy min- 
terms such that they cover only those areas. In other words, 



although the number of fuzzy min-terms theoretically can be 
very large, commonly only a few number of these min-terms 
have a considerable influence on the system output. Hence, in 
order to construct a fuzzy inference system we need to identify 
only the most important fuzzy min-terms (or fuzzy rules). 

Now consider the case in which the fcth rule given in (HJ is 
subjected to the following fuzzy input data: 

q: xx is A[ AND x 2 is A' 2 AND . . . AND x n is A' n , (4) 

which can equivalently be expressed as: 

Q ■ x is H' n , (5) 

where {A[, A' 2 , . . . , A' n } are fuzzy sets with membership 
functions fiA'.( x i) (i = 1> 2, ...,«), and S' n is a fuzzy set 
with multi-variable membership function jttjy (x). The input 
data given in stimulates the antecedent part of each of the 
fuzzy rules given in (Q~|i to a certain degree. Clearly, the amount 
of activation of each fuzzy rule determines the contribution of 
the consequent part of that rule in the final output. Various 
methods are available to determine the amount of activation 
of the antecedent part of a fuzzy rule for the given fuzzy input 
data [21 1. However, in the following we want to deal with this 
issue from a different point of view. 

Consider again the fuzzy input data given in <(3j which 
specifies the subspace H' n with fuzzy borders. In general, 
this subspace overlaps with the subspace of each of the fuzzy 
min-terms, i.e. S„s, in the n-dimensional space to a certain 
degree. But, unlike the binary logic, Yl' n most likely does not 
completely overlap with any of the S„s (k = 1, . . . , N) and 
consequently, here it is reasonable to assume that various fuzzy 
min-terms are activated to different degrees when the system 
is subjected to this fuzzy input. More precisely, the degree of 
activation of the fcth fuzzy rule is proportional to the amount 
of overlapping between subspaces Sjj and H' n . So, similar to 
the binary logic, the fuzzy inference system first evaluates the 
similarity between the fuzzy input data and fuzzy min-terms 
and then applies a kind of soft-thresholding function (i.e., the 
s-norm operator) to determine the contribution of the output 
corresponding to each fuzzy min-term in the final outcome. 

Here, we tried to show some of the similarities between 
fuzzy logic and logical circuits. On the other hand, in previous 
section we demonstrated how the aggregation of logical min- 
terms can be performed by using Hebbian learning rule. 
Therefore, it seems that by inspiration from logical circuits, 
we can propose a simple method to create fuzzy rules auto- 
matically based on the available training data. However, the 
main importance of the proposed method relates to its ability 
to work with ANNs. In fact, in the rest of this paper we will 
show that our proposed method can be used to train large 
scale ANNs, which can be considered as a big step toward 
the emulation of the computing power of human brain. This 
is mostly because of the fact that, as we will show in the next 
section, ANNs can be considered as systems that work with 
fuzzy concepts. 

C. Interpretation of ANNs as fuzzy inference systems 

The aim of this section is to provide an answer to the 
following questions from a new point of view: (1) How 
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Fig. 2. Typical artificial neural network with one hidden layer. In these 
systems, connection weights are usually determined during the learning 
process by the use of optimization methods. 

similar is the behavior of ANNs to fuzzy inference systems? 
(2) How can we effectively and optimally determine and 
implement the fuzzy min-terms and consequently, inference 
engine of a very large-scale system? By answering the former 
question we can show that fuzzy inference systems have 
biological support. In order to answer these questions and 
finally propose our method for hardware implementation of 
neuro-fuzzy computing systems first we describe the function 
of ANNs based on fuzzy concepts in a new and interesting 
manner. 

Consider the two-layer ANN shown in Fig. [2] In this figure: 

• x = [xx,X2,...,x nx ] and y = y 2 , . . . , y ny ] are the 
vector of input variables entered to the input layer, 

• v = [vi,V2, - ■ ■ ,v nv ], where v.i is the output of neuron 
number i of hidden layer, 

. w xv — [wfj]nvxnx is the matrix containing the weights 
connecting the neurons of input layer to the neurons 
of hidden layer, where uuff is the weight of synapse 
connecting the jth entry of x, Xj, to the ith entry of v, 
Vi. Using this notation, output of neuron i of the hidden 
layer, Vi, is obtained as: 

(nx ny \ 

i=i i=i I 

= /(wr'x T + wfy T ), i = l,2,...,nv,(6) 

where wf = [wff, wf 2 v , . . . , wf£ x ], wf = 
[v)ii,Wft, . . . ,Win V ]i ar, d /(•) is the activation function 
of the neurons of hidden layer. 

• W" 2 = [wfflnz xnv is the matrix containing the weights 
connecting the neurons of hidden layer to the neurons 
of output layer, where w\j is the weight of the synapse 
connecting the jth entry of v, Vj, to the ith entry of z, 

Zj. 

> Zj is the output of the ith neuron of output layer which 
can be expressed as follows: 

/ nv \ 

Zi = f -/(wrv T ), i = l,2,...,n*, 



where wf = [w%,w%, . . . , wV z nv \. 

Figure |3] shows the realization of the ANN shown in Fig. [2] 
In this figure, matrices of synaptic weights are implemented by 
using the crossbar structure and it is assumed that the crossbars 
have the property that generate the sum of products of input 
variables in the weights stored at cross-points. For example, 
the signal entered to each neuron of hidden layer is equal to 
the sum of the product of signals at input layer to the synaptic 
weights stored at cross-points (i.e., the signal entered to the 
ith neuron of hidden layer is equal to wf v x. T + wfy T ). In 
this figure typical values are assigned to synaptic weights and 
input data such that the height of each bar is proportional to 
the value of the corresponding variable or weight. 

Interestingly, function of the structure shown in Fig. |2] can 
be interpreted in another way. In fact, similar to the case shown 
in Fig. [T] xi, . . . , x nx and their corresponding vertical wires 
in the first crossbar can be considered as the representative of 
the different values or concepts the linguistic input variable x 
can take. We have a similar situation for variables y and z. In 
this case, at the input and output terminals and at the rows of 
the structure we have constructed the discrete universal set of 
the variables x, y and z. Now, the special values assigned to 
xi, . . . , x nx can be considered as the points of the membership 
function of the fuzzy input proposition "x is A'". In other 
words, here it is assumed that the input applied to the ith input 
neuron shows our confidence degree about the occurrence of 
the concept assigned to that neuron for the observed fuzzy 
input data. Similarly, the synaptic weights stored at each row 
of the crossbar located between input and hidden (hidden 
and output) layer can be considered as the points of the 
membership function of the antecedent (consequent) part of 
the corresponding fuzzy rule. For example, Fig. [3] shows 
implementation of the fuzzy sets A\ and B\ (defined on the 
universal set of x and y, respectively) based on the coefficients 
stored at the lowest row of the weight matrices W"' and W yv , 
respectively. Clearly, in this structure any new fuzzy rule can 
be added to the fuzzy rule-base simply by adding a new row to 
the first structure and then adjusting the weights to appropriate 
values. 

Now we can explain the function of the first part of the ANN 
shown in Fig. [5] which is mathematically described in ©. 
Since we assumed that all inputs and synaptic weights show 
the confidence degrees and consequently are non-negative 
variables, we can conclude that the dot-product of x and wf " 
(as well as y and w^) indicates the similarity between these 
two vectors (or the concepts represented by these membership 
functions). For example, it is obvious that larger the value 
of wf"x T larger the similarity between wf" and x and 
therefore larger our confidence degree about the occurrence 
of the predefined concept wf v at the input variable x. This 
fact provides us with a new explanation for the task of the 
thresholding function in ANNs. To make it clear, consider 
again the equation relates the output of neurons of hidden 
layer to the inputs and synaptic weights: 

Wi = /(wrx T + wfy T ) ) i — 1,2,..., nv . (8) 

In the above equation without using the thresholding func- 
tion / it is not possible to determine whether the output 
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Fig. 3. Simple realization of the ANN shown in Fig. [2] based on fuzzy concepts. In other words, this figure shows that it is possible to interpret the working 
procedure of conventional ANN similar to what we have in fuzzy inference system without changing its structure. In this figure, each row of the structure 
implements a simple fuzzy rule or min-term. In fact, here we have assumed that connection weight matrices model the universal sets of input and output 
variables and by programming their entries correctly, any fuzzy set can be created on these universal sets. 



is caused by only one of the inputs or both of them. By 
suitable choice of this function the output of neuron i is 
activated only when both x and y are similar to wf " and w^ u , 
respectively. Clearly, activation of the ith neuron of hidden 
layer indicates the simultaneous occurrence of the concepts 
defined by w?° and vrf". Hence, the role of thresholding 
function / in <£Sj is somehow similar to the role of AND 
gates in logical circuits. However, since here the probability 
of having a complete match is very low, hard-thresholding 
activation function cannot be used. It concludes that in Fig. 
[3] each row of the crossbar located between input and hidden 
layers implements the antecedent part of a fuzzy rule, and 
the output of corresponding neuron in hidden layer shows the 
amount of activation of that rule (fuzzy min-term) for the given 
input. 

The task of the soft thresholding function, /, used in the 
neurons of hidden layer can also be interpreted in another way. 
In Section Hl-BI we mentioned that the amount of activation of 
the kth fuzzy rule is proportional to the amount of overlap 
between subspaces S„ and Yj n . So, the question here is: 
How can we measure the amount of overlap between two n- 
dimensional subspaces by using the typical integrated circuits 



(which are actually two-dimensional devices)? A simple and 
efficient answer is that we can measure the similarity at 
each dimension separately and then combine the results (by 
using a t-norm type operator) to obtain the total similarity 
between S„ and SJ r It concludes that the thresholding 
function, /, in ANNs plays the role of i-norm operator in 
fuzzy inference systems. For example, it can be observed that 
the ANN shown in Fig. [3] first measures the similarity at 
each dimension separately (by calculating wf *x T and wf v y T 
in x and y dimensions, respectively) and then applies the 
thresholding function to the sum of these two values to detect 
the simultaneous occurrence of predefined concepts {i.e. wf" 
and w^ 11 ). As mentioned before, a good method to combine the 
similarities obtained at each dimension separately is to use an 
operator that acts somehow as the fuzzy i-norm operator. More 
precisely, if a and b are two variables of confidence-degree 
type (which indicate the similarity between input variables 
and the antecedent part of fuzzy rules stored at the rows of 
crossbar) the operator T that combines them should have the 
property: 

if a < b then T(a, b) < a. (9) 
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But, assuming a = wf "x T and b = wf" y T the thresholding 
function given in ([8]) leads to T(a, b) = f(a+b) which, in gen- 
eral, does not satisfy (O. However, by suitable choice of / it 
can be observed that this function acts very similar to the fuzzy 
i-norm operator or the p-input logical AND gate. For example 
in Fig. [4] we have compared the results obtained by applying 
four different operators defined as: Ti(a, b) — min(a, b), 
T 2 {a,b) = (a + b) 3 , T 3 (a,b) = (a + 6) 9 , and T 4 (a,b) = 
tansig((a + b) - 3) = 2/ (1 + exp (-2 * (a + b - 3))) - 1 to 
the operands a and b such that < a, b < 1. Note that in this 
figure output of each operator is normalized such that it lies 
between and 1. As it can be seen in this figure, operators 
Ti and T4 (which is a very common activation function in 
ANNs) acts similar to fuzzy t-norm operator T\. In addition, 
Fig. |4(c)| shows that by increasing the power in the definition 
of operator T2 this operator can be reduced to the 2-input 
logical AND gate. 

Now, we can study that part of circuit located between 
hidden and output layers. In this part of circuit, the weight 
matrix W uz is implemented by using the crossbar structure 
similar to previous discussions. It is also assumed that the 
value of each neuron at output layer indicates the confidence 
degree to a certain point in the universal set of output variable 
(or the concept represented by that output neuron). Hence, 
the values generated by neurons of output layer constitute the 
membership function of output fuzzy number. Note that the 
output of the ANN shown in Fig. [3] can still be connected 
to the input of another ANN of this type. The weight matrix 
W uz determines which output and to what extent should be 
activated when a fuzzy min-term is activated (or equivalently, 
when certain concepts occurred simultaneously at the input). 
Note that in this structure we have actually assumed that the 
learning is the procedure of making appropriate connections 
between fuzzy min-terms and output concepts (but not be- 
tween input and output fuzzy concepts). It concludes that the 
elements of the weight matrix W" z can easily be adjusted 
by applying the Hebbian learning method (as described in 
Section III-Ab to the neurons of output and hidden layer. It 
means that when for a certain input-output data a neuron 
in output layer is activated simultaneous to a neuron in 
hidden layer, the synaptic weight connecting these two neurons 
should be amplified and the amount of this amplification 
should be proportional to the strength of activation of the 
corresponding two neurons. Interestingly, in iflOl we showed 
that this process is also equivalent with the creation of fuzzy 
relation ET1 between min-terms and output concepts. Note that 
the membership function of the consequent part of each fuzzy 
rule is constructed on a row of the second crossbar and the 
final fuzzy output is equal to the weighted sum (aggregation) 
of these membership functions, where the weight assigned to 
each membership function is proportional to the strength of 
activation of the antecedent part of that fuzzy rule. 

The interpretation proposed in this section for the function 
of ANNs has many advantages to the classical descriptions. 
First, in this method all input data, output data and synaptic 
weights are non-negative. Second, unlike other training meth- 
ods (such as back-propagation) in the proposed structure the 
appropriate value of synaptic weights can be obtained without 



the need to optimization methods. The main reason for this 
statement is that the first crossbar of the ANN shown in Fig. 
[3] implements the fuzzy min-terms which can be constructed 
independent of the definition of final input-output relation. 

To sum up, in this section we showed that the operation 
of ANNs can be interpreted in a quite different way without 
applying any changes to the classical structure. Moreover, we 
showed that the two-layer ANN can be considered as a fuzzy 
inference system with fuzzy input and fuzzy output. In the next 
section we present a method of hardware implementation and 
training the proposed neuro-fuzzy computing system, which 
is obtained by inspiration from logical circuits, fuzzy logic 
and ANNs. We will also show that the proposed neuro-fuzzy 
system is really effective and can be used to solve some of 
the complicated engineering problems. 

III. Realization, training and application of the 

PROPOSED NEURO-FUZZY SYSTEM 

In this section we explain the hardware implementation, 
training and some applications of our proposed nuero-fuzzy 
system, which has considerable differences with existing meth- 
ods. It will also be shown that the proposed structure has the 
advantage that can effectively be designed to deal with massive 
input-output data. 

The overall structure of the proposed neuro-fuzzy comput- 
ing system is the same as the one depicted in Fig. [3] which 
can also be considered as a two-layer network. In this figure, 
without any loss of generality, it is assumed that x and y are 
scalar inputs and z is the scalar output. Note that this system 
is actually a dynamical structure which is incomplete at the 
beginning and being more and more completed by training 
it with the new data. It means that the hidden layer does not 
have any neurons before training and the neurons are generated 
right after subjecting the system to input-output training data. 
For this purpose, first the neurons of input layer should be 
divided to few groups such that the number of groups be equal 
to the number of input variables. The number of neurons at 
each group is, in general, different with others and depends 
on the accuracy required to model the variable corresponds 
to that group. In fact, at each group one neuron is required 
for any new concept or value that input variable can take. 
For example, 101 neurons are required to deal with a variable 
that varies between and 10 with the resolution of 0.1. In 
this case, the first neuron will represent concept "x = 0", the 
second neuron will represent concept "x = 0.1" and so on. 
Clearly, by increasing the number of neurons at input layer 
any input variable can be constructed with a desired accuracy. 
Similarly, one neuron should be considered at output layer for 
any distinct value of the output signal. For the given input, 
the signal generated at each neuron of output layer shows the 
confidence degree of system to the special value or concept 
assigned to that neuron. Hence, the output of this system is 
actually a fuzzy data which can be converted to a crisp number 
by applying any defuzzification method. 

It is assumed that the output of each neuron at input layer 
is exactly equal to its input (i.e., it has identity activation 
function), and the output of each neuron at output layer is 



(a) 2i (a, 6) = min(a, b) (b) T 2 (a,b) = (a + b) 3 (c) T 3 (a, b) = (a + ft) 9 (d) T 4 (a, 6) = tansig((a+ft) - 
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Fig. 4. Comparison between different operators. This figure verifies that the proposed activation function, i.e. (a + b) p , acts similar to other t-norm operators. 
In addition, by comparing the shape of operators T 2 and T4 it can be said that the activation function of neurons in ANNs to some extend is a t-norm 
operator. 



equal to the weighted sum of the signals entered to it (again 
having identity activation function). As it can be observed 
in Fig. [5] for each of the input variables a fuzzy set with 
any desired membership function can be constructed on each 
row of the crossbar located between input and hidden layers. 
For example, in this figure the fuzzy sets A\ and A 2 are 
constructed on the universal set of x and the fuzzy sets B\ 
and B 2 are constructed on the universal set of y. It will be 
shown later that the fuzzy sets constructed on the rows of this 
crossbar constitute the antecedent part of if-then type fuzzy 
training data. 

A. Response of the proposed neuro-fuzzy computing system to 
the given input 

In this section we assume that the proposed neuro-fuzzy 
computing system is designed and fully trained, and we 
discuss only on different aspects of computing its output for 
the given input. The training procedure will then be discussed 
in Section IIII-BI Note that, as it will be shown later, the 
proposed system can be trained during its ordinary work and 
the only reason for presenting the training procedure in another 
section is for the sake of clarity. 

In Fig. [3] assume that x* is the value of the signal entered 
to the ith neuron of the group corresponding to variable x, 
Xi is the concept assigned to the ith neuron of this group, 
y* is the value of the signal entered to the ith neuron of the 
group corresponding to variable y, yi is the concept assigned 
to the ith neuron of this group, nx is the number of neurons 
used to cover the universal set of the fuzzy variable x, ny is 
the number of neurons used to cover the universal set of the 
fuzzy variable y, z* is the value observed at the ith neuron 
of output layer, Zi is the concept assigned to the ith neuron 
of output layer, nz is the number of neurons used to cover 
the universal set of the fuzzy variable z, Vi is the output of 
ith neuron of hidden layer, is the number of neurons at 
hidden layer after applying k set of input-output training data, 
W xv = | wffi , 71 . is the weight matrix containing the 
coefficients connecting the neurons corresponding to variable 
x in input layer to the neurons of hidden layer, W yv — 
. is the weight matrix containing the coefficients 
connecting the neurons corresponding to variable y in input 
layer to the neurons of hidden layer, and W l,z = { w""f j „ ri 
is the weight matrix containing the coefficients connecting 



the neurons of hidden layer to the neurons of output layer. 
Using these notations if we denote the input vectors as 
x* = [xl,x* 2 , . . . , x* x ] and y* = . . . , y* y ], output of 

the ith neuron at hidden layer is obtained as 

Wl = /(wrx* r +wfy* T ) = (wrx* T + wfV T )", 

i = 1,2,. ..,JV* p> 1,(10) 

where /(.) is the activation function of neurons, w xv is 
the ith row of W xv and wf" is the ith row of W yv . The 
main difference between ( TTOb and similar equations observed 
in classical ANNs is in the activation function, which is 
considered as /(s) = s p here. In the following we will discuss 
on the reason of using this type of activation function in more 
details. 

Consider again the neuro-fuzzy system shown in Fig. [3] 
As mentioned before, each row of the first crossbar contains 
the membership function of two fuzzy sets, and each fuzzy 
set is constructed by equating the synaptic weight stored at 
each cross-point to the value of the corresponding membership 
function at that point. According to the previous discussions, 
the data stored at each row of the first crossbar constitutes 
a fuzzy min-term, which approximately refers to a unique 
combination of two fuzzy input variables. It concludes that in 
practice each distinct input-output training data can form the 
antecedent part of an AND-type fuzzy rule, which is stored 
on one row of the first crossbar. For example, the second row 
of the first crossbar in Fig. [3] implements the antecedent part 
of the following fuzzy rule: 

IF x is A 2 AND y is B 2 THEN z is C 2 . (11) 

In order to evaluate the output of each neuron at hidden 
layer for the given fuzzy inputs first the similarity between 
the membership function of each fuzzy input and the corre- 
sponding pattern (membership function) stored at each row of 
the crossbar is determined by calculating the internal product 
of these two membership functions. For example, in the second 
row of crossbar this procedure is equivalent to the calculation 
of w xv x* T for variable x and w;Ty* T for variable y. Now, in 
order to determine the output of the ith neuron of hidden layer, 
which shows the amount of activation of the corresponding 
fuzzy rule, the values obtained for w xv x* T and w 2 v y* T 
should be combined using a t-norm type operator. In fact, 
the operator that combines w xv x* T and w|"y* T should have 
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Fig. 5. This figure shows how the activation function of neurons can be 
implemented when the activation function is modeled by a t-norm operator. 
This implementation has this drawback that it depends completely on the 
number of input variables. 



the property that generates considerably big outputs only 
when both of these two numbers are considerably large. One 
possible approach for hardware implementation of such a 
t-norm is to use the structure shown in Fig. [5] The main 
drawback of this structure, as well as the logical circuit 
shown in Fig. [T] is that the number of inputs of the t-norm 
operator depends on the number of fuzzy inputs of system. 
Note that in order to simulate the behavior of brain we need 
to develop structures with numerous number of inputs and 
outputs where each output is the function of only a few but 
different number of inputs. But, if in the structure shown in 
Fig. |3] we use different t-norm operators with different number 
of inputs, the resulted hardware will be very complicated and 
inefficient. Equation < TT~0b proposes that the i-norm of wf v x* T 



and w^y* T be defined as 



yv *T 

wf y* 



where 

p > 1 is an arbitrary integer. As explained before and showed 
in Fig. |U it can be easily verified that for the values of 
p >> 1 the output of each neuron, when both wf x* T and 
w f^y* T are large* is much bigger than the case when either 
wf v x* T or wf" y* T is large. Although the activation function 
defined in ( TTOb does not satisfy all requirements of a i-norm 
operator, we will show in Section [V] that it works very well 
in practice. Moreover, it has the advantage that unlike other 
activation functions used in classical ANNs does not have any 
thresholding or other parameters to tune. Another advantage 
of this activation function is that it leads to a structure whose 
hidden layer is not changed by increasing the number of the 
system inputs. 

Considering the output of hidden layer as v = 
[vi,V2, ■■■ ,v N k], output of the ith neuron at output layer in 
Fig. [3] is obtained as 



Equation (fT2l can be expressed in vector form as 

rj = v(w- 



(13) 



1,2,. 



(12) 



where the dimension of W 112 is nz x N*. 

As it can be observed, this part of system acts very similar 
to the second layer of classical ANNs and the only difference 
is that the activation function of neurons at the output layer 
of Fig. O is of identical type. In fact, the output vector z* 
determines the membership function of the resulted fuzzy 
output for the given fuzzy inputs. 

B. Training the proposed neuro-fuzzy computing system 

In the previous section we showed that the working pro- 
cedure of all parts of the proposed neuro-fuzzy computing 
system, except the activation function of neurons at hidden 
layer, are similar to a classical ANN. However, what makes 
the proposed system considerably different and efficient is the 
procedure of its training. More precisely, the proposed system 
is trained using supervisory method based on the given input- 
output data, and as it will be shown later, training the synaptic 
weights or other parameters of the system can be performed 
without the need to any optimization method. Without any 
loss of generality, consider again the neuro-fuzzy computing 
system shown in Fig.[3]and assume that the system already has 
been subjected to k input-output training data and currently 
it has < k neurons at hidden layer. In the following 
we discuss on the method of training the system when it is 
subjected to a new training data. 

Assume that we are given the (fc + l)th input-output training 
data and our goal is to train the system such that it learns 
this new data. Note that since the system under consideration 
has two inputs and one output, the new training data must 
consist of two fuzzy sets as input variables and a fuzzy set as 
output variable. But, in the following, for the sake of simplicity 
we will assume that the input training data consists of two 
fuzzy sets while the output training data is a crisp number. 
Denote the fuzzy training inputs as x* and y*, and apply 
them to the corresponding neurons of input layer (recall that 
each set of neurons at input layer covers the universal set 
of the corresponding fuzzy input). According to the previous 
discussions, applying these fuzzy inputs to system will cause 
the activation of the output of each neuron at hidden layer 
to a certain degree. Then the output of neurons of hidden 
layer are combined together according to ( foi l to form the 
membership function of the output variable. Defuzzifying this 
fuzzy output leads to a crisp number as the final output. If the 
difference between this number and the output training data 
be less than a predefined threshold we can conclude that the 
system already has been trained with a very similar training 
data and consequently we do not need to train the system 
with this new data again. Two points should be noted here. 
First, since we assumed that the output training data is a crisp 
number we have to defuzzify the output of our neuro-fuzzy 
computing system in order to be able to compare it with 
the training data. However, in some applications the output 
training data is itself a fuzzy concept (hereafter denoted as 
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u*) and consequently no defuzzification is required at the 
output layer. In this case the difference between the output 
training data and output of the neuro-fuzzy computing system 
can be measured by calculating, e.g., the internal product of 
these two output vectors (membership functions of the two 
fuzzy numbers). Second, the value of this threshold can easily 
be determined according to the accuracy needed by user and 
without the need to any optimization. Obviously, decreasing 
the value of this parameter will increase the accuracy of system 
at the cost of using more fuzzy min-terms (or equivalently, 
more neurons at hidden layer). 

When the difference between the output generated by the 
neuro-fuzzy computing system and the given training output 
is larger than the predefined threshold value the system should 
be trained with the new training data. In fact, the main reason 
causes the system not to be able to generate an accurate output 
for the given input data is that it has not been subjected to a 
similar data before. In this case first we create a new fuzzy 
min-term by adding a new neuron to hidden layer and then 
we apply appropriate changes to the synaptic weights stored 
in the second crossbar to make the final output closer to the 
output training data. 

The simplest way to create a new fuzzy min-term (to 
cover the subspace specified by this new data) is to store the 
membership functions of the new input data at the cross-points 
connecting the neurons of input layer to the new neuron added 
to hidden layer. For this purpose we can simply store x* and 
y* (which are the membership functions describing input data) 
at the new row added to the first crossbar. Hence, after adding 
the neuron number 2V* + 1 to the hidden layer of system we 
set 

w^ +1 =x*, (14) 

and 

w^ +1 =y*. (15) 

In this case, the output of this newly added hidden neuron 
will become active only when the applied input data is close 
enough to the trained data characterized by x* and y*. Now, 
in order to apply appropriate changes to the synaptic weights 
stored in the second crossbar (to form connections between 
min-terms and output concepts) first we calculate the output 
of neurons of hidden layer as 

Vi = f (wrx* T + wfy* T ) = (wrx* T + wfy* T ) P , 

i = l,2,...,A* + l, P >1. (16) 

Then we update the synaptic weights stored in the weight 
matrix W vz using the Hebbian learning method. In this 
method the weight of the synapse connecting two active neu- 
rons (one in hidden and another in output layer) is amplified 
proportional to the amount of activation of these two neurons. 
More precisely, after calculating the output of neurons at 
hidden layer, the synaptic weight connecting the ith neuron 
of hidden layer to the jth neuron of output layer is updated 
as follows: 

</ <- w^ + ativ^Ui), i = l,...,nz, j = 1,...,N% + 1, 

(17) 



where Ui is the value of the output training data, i.e. u* at 
the ith output neuron, a is the learning coefficient, and t is 
the t-norm operator used to implement the Hebbian learning 
method. Note that since the latest fuzzy min-term added to 
the neuro-fuzzy computing system is exactly the same as the 
membership function of fuzzy inputs, output of the (k + l)th 
neuron of hidden layer would be much bigger than the output 
of other neurons at this layer when the system is subjected 
to the latest inputs, x* and y*. Hence, according to ( TTTb the 
synaptic weights connecting the (N£ + l)th neuron of hidden 
layer to the neurons of output layer (i.e., the coefficients on the 
(N£ + l)th row of W l,z ) are mainly affected by the Hebbian 
learning method. By repeating the above procedure for any 
new input-output training data, the structure becomes more and 
more completed. Interesting point in relation to the training 
of this system is that, unlike many other training methods, 
each input-output data is applied only once and consequently 
problems like over-training never occur. Moreover, according 
to ( TPTI i this system can be trained by using both fuzzy and 
crisp data without the need to any optimization algorithm. 

It may seem that the proposed neuro-fuzzy computing 
system just stores the input-output training data and does not 
perform any computation. But, by taking into account the 
possible overlaps between fuzzy min-terms, it is observed 
that the size of the data stored in system can be much 
smaller than the size of the input-output training data, specially 
in dealing with long-term records. In most of the practical 
cases the rate of adding new min-terms to the system is 
(almost monotonically) decreased by carrying on the training 
procedure. Finally, note that another advantage of the proposed 
neuro-fuzzy computing system is that memory and computing 
units are assimilated together, which is similar to what has 
happened in all of the living things. 

IV. Hardware implementation of the proposed 

NEURO-FUZZY COMPUTING SYSTEM 

In this section, we will discuss on advantages of the pro- 
posed computing system from the hardware implementation 
point of view. During the explanation of the proposed method 
in previous sections the reader may have recognized that the 
suggested learning algorithm is not perfect and can be easily 
improved in different ways (e.g., by modifying membership 
functions of the programmed fuzzy sets). However, in the 
following we will show that the main reason for suggesting 
this special structure is its simple hardware implementation. 
For this purpose first remember that our main goal is to 
design a computing system with the ability of emulating the 
computing power of human brain. Therefore, by considering 
the structural complexity and size of the real neuron, simplicity 
of the hardware is of critical importance for its success. This 
means that the final system should be easily expandable by 
merging basic computing units and it should have a simple 
content- or concept-based learning method (similar to content- 
based addressable memories with computing ability) which 
can be easily mapped into hardware. Hence, using any kind 
of optimization method is not allowed evidently since their 
hardware implementation is very costly and inefficient even 
for a small network. 
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Fig. 6. Memristor crossbar-based circuit proposed to do vector to matrix 
multiplication which can be used for the hardware implementation of the 
proposed neuro-fuzzy system. 

Now let us see how the proposed system can be imple- 
mented in practice. As stated in Section IIII-AI during the 
ordinary work of system, both parts of the structure shown 
in Fig. [3] perform simple vector to matrix multiplication (see 
Eqs. ( TTOb and[T3]). Note that in this case, the i-norm operator 
or the activation function of neurons is applied to the results 
obtained from these vector to matrix multiplications. Figure 
[6] shows the proposed circuit to perform vector to matrix 
multiplication. This circuit consists of a simple memristor 
crossbar [22] where each of its rows is connected to the 
virtually grounded terminal of an operational amplifier that 
plays the role of a neuron with identity activation function. 
The memristor crossbar structure has two sets of conductive 
parallel wires crossing each other perpendicularly such that 
at each crosspoint a semiconductor device is fabricated be- 
tween two crossing wires. In the structure shown in Fig. [6] 
this semiconductor element is a memristive device [4], |23|. 
Memristive device is a nonlinear element whose resistance 
(known as memristance) can be tuned by applying a suitable 
voltage to the device 11241 . Since there is a threshold in the 
physical model of the device, amplitude of the applied voltage 
should be larger than this threshold to be able to change the 
state (and consequently, the memristance) of the device l25ll . 
Hence, assuming that the amplitude of the voltages applied 
to the circuit of Fig. [6] is below the threshold of memristive 
devices, output of the zth neuron (operational amplifier), Oi, 
can be written as: 

where My and dj are the memristance and memductance 
(inverse of memristance) of the memristive device located in 
the crossing point of the ith row and the jth column of the 
crossbar, Rf is the feedback resistor of operational amplifiers 
and Ij is the input voltage applied to the jth column of 
the crossbar. Note that since the memristance of memristive 
devices is not changing during this computation, in Eq. ( |T8l 
they have been treated as ordinary resistors. By comparing 
Eqs. (fT8l and ( TTOb (or Eqs. ( fT8l and ( fT3l » it becomes clear 



that this structure is a perfect circuit to implement the proposed 
method. For this purpose, connection weight matrices (which 
contain the membership function of corresponding fuzzy sets) 
should be stored as a memductance of memristive devices in 
crosspoints of the crossbar. On the other hand, each neuron 
should compute the weighted sum of its inputs and then apply 
the activation function to the result. This can be done by 
applying a suitable non-linear function, e.g. /(■) = to 
the output of operational amplifiers in Fig. [6] Finally, it is 
evident that the computing structure shown in Fig. [3] can be 
implemented by series connection of two of these memristor 
crossbar structures (i.e., connecting the outputs of one of these 
circuits directly to the inputs of another one). 

Now, consider the learning process described in Section 
IIII-BI As explained in that section, the learning process of 
the proposed method consists of two different phases: (i) 
creation of min-terms (storing fuzzy sets) on the first crossbar 
(which connect the neurons of input layer to the neurons 
of hidden layer) and (ii) updating the connection weights 
stored in crosspoints of the second crossbar (which connect 
the neurons of hidden layer to the neurons of output layer). 
In order to add a new min-term to the sample of the structure 
shown in Fig. [6] first a new row should be added to the 
first crossbar of this structure. For this purpose from the 
beginning we can reserve some of the rows of the first crossbar 
for storing upcoming data. In this case adding a new row 
is equivalent to tuning the memristance of the memristors 
located on an unused row to suitable values. In order to store 
weights on the newly added row (or equivalently, storing the 
membership function of fuzzy input sets) we interpret the 
value of the membership function at each point as a voltage 
signal and then we apply it to the corresponding column of the 
crossbar. Now, by grounding the new row while other rows are 
connected to a high impedance, current will pass through the 
memristors connected to this row. Assuming that all of these 
memristors initially have a same memristance and considering 
the fact that the amount of electrical current passing through 
a memristor is proportional to the amplitude of the voltage 
applied to it, it can be concluded that the conductance of 
each memristor on the newly added row is proportional to the 
amplitude of the voltage applied to the corresponding column 
(or equivalently, to the membership function of fuzzy input at 
that column). This means that only by applying membership 
functions to columns of the first crossbar while newly added 
row is grounded and then waiting for specific time, a new min- 
term corresponding to the input training data automatically 
will be added to the crossbar. Note that several methods have 
been proposed so far to change the memristance of specified 
memristors in a crossbar without altering the memristance of 
other semi-selected memristors 11221 . 

Now, let us consider the problem of hardware implementa- 
tion of the second phase of the proposed learning method on 
the structure shown in Fig. [6] which is used to implement the 
second crossbar in Fig. [3] (from hidden layer to output layer). 
In fact, here we want to update the corresponding weights (i.e. 
memductance of memristors) based on the given training data. 
It is concluded from Eq. (T% that for any given training data, 
memductance (connection weight) of the memristor connect- 
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Fig. 7. This figure shows how the weight stored in the memristor typically 
changes versus the applied voltage during the learning process. From this 
figure it can be figured out that a single memristor acts as a t-norm operator. 
It changes the stored value (memristance of the memristor) significantly when 
the voltage across the device is big (so the probability that both voltages at 
terminals of memristor have big values is high as well). In this simulation 
the HP model [4] is used to simulate memristor and voltages are applied to 
memristor for about 0.05 second. 



ing a neuron of hidden layer to a neuron of output layer should 
be changed such that the amount of this change be proportional 
to the sum of the firing strength of these two neurons. In 
order to implement this updating method in the memristive 
structure shown in Fig. [6j first we inject the input training 
data to system and let the neurons of hidden layer generate 
their own output signals (see Eq. (TTOl)). Then, we interpret the 
value of membership function of the output fuzzy training data 
(or u*) at any point as a voltage signal and apply the negative 
of this voltage to its corresponding column of the crossbar. In 
this case, the current passing through the memristor connecting 
a hidden neuron to an output neuron will be proportional to 
the voltage dropped across this memristor, which is equal to 
the sum of the absolute value of the voltages applied to row 
and column of the crossbar that this memristor is located 
between them. For example, for the memristor located in 
the crossing point of the ith row and the jth column of the 
crossbar this voltage will be equal to Ui+Vj. Now, application 
of this voltage will cause the memristance(memductance) of 
the memristor to decrease(increase) which will increase the 



weight Wij 



stored in this memristor (so the relation 



expressed in Eq. (fl7]i). Figure [7] shows how the weight stored 
in a typical memristor, i.e. my = A f' f ^ , changes versus the 
amplitude of the voltage Uj + Vj when it is applied to the 
device for a specific period of time. In this figure, Rf and 
the initial memristance of the memristor are considered equal 
to R ff (the maximum memristance that the memristor can 
have). 

By comparing Figs. [7] and |4] it can be inferred that each 
memristor somehow applies a t-norm operator to the two 
voltages connected to its terminals (so the function t(u.i,Vj) 
in Eq. (fTTIi): when both voltages have high values (i.e. the 
neurons are fired simultaneously), the weight stored in the 
memristor is increased much more than the case in which 
only one of these neurons is fired. This means that at the end 
of this learning procedure, we will see a strong connection 



only between those hidden and output neurons that usually 
fire simultaneously. To summarize, to carry out this learning 
process we should only apply training data to input and output 
terminals of the structure and wait some period of time. This 
will cause the memristance of memristors to change based on 
the Hebbian learning rule similar to what we had in Eq. ( fTTT i. 
To conclude, by using two of these memristor crossbar struc- 
tures, connecting them to each other correctly and managing 
the amplitude and timing of the applied voltages corresponding 
to training data, the proposed neuro-fuzzy system or any rule- 
based fuzzy inference method can be simply constructed. 

V. Simulation results 

In this section we show the high potential of the proposed 
neuro-fuzzy computing system for solving different engineer- 
ing problems in the field of modeling and classification. One 
important capability of the proposed system is to model multi- 
variable mathematical functions. To show this, consider the 
functions: 



gi(x,y) = 10.391((x-0.4)(y- 0.6) + 0.36), 



(19) 



g 2 (x, y) = 24.234 (r 2 (0.75 - r 2 )) , r 2 = (a;-0.5) 2 +(y-0.5) 2 , 

(20) 

g 3 (x,y) = 42.659 (0.1 + £ (0.05 + £ 4 -10x 2 y 2 +5y 4 )), 
x = x-0.5, y = y -0.5, (21) 

gi (x,y) = 1.3356 (1.5(1 -x)+e 2 *~ 1 sin(3 7 r(2;-0.6) 2 )) 
+ 1.3356 (e 3fe -°- 5) sin(47r(y - 0.9) 2 )) , (22) 

g 5 {x,y) = 1.9(1.35 + e :E sin(13(a;-0.6) 2 )e- ! 'sin(7y)), (23) 

which are proposed by Hwang et al. l26l to study the learning 
and modeling capability of systems (in all of the above 
functions it is assumed that x,y £ [0,1]). In the following 
we use the structure shown in Fig. [3] to model each of these 
functions, which are also shown in Fig. [8] 

In each simulation the neuro-fuzzy computing system is 
subjected to 225 training data and 10,000 test data, which are 
randomly selected from the space of input variables. Moreover, 
power of the activation function of neurons at hidden layer 
(i.e., the value of p in ( flOl i) and the value of a in ( fTTT i are 
considered equal to 7 and 0.0005, respectively (these values 
are obtained by a simple trial and error and it is observed 
during numerical simulations that the final results are not 
so sensitive to the values assigned to these parameters). The 
Fraction of Variance Unexplained (FVU) performance index 
defined as follows ll26ll 



FVU 



Zji=i \g\Xt,yi) gj 

^ 10000 
9= Tnnnn d(Xi,Vi), 



10000 



(24) 



i=l 



is used to evaluate the modeling accuracy of the resulted 
neuro-fuzzy computing systems, where g(xt, j/,) and g{xi,yi) 
are the values generated by function itself and the proposed 
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Fig. 9. Fuzzification process used to convert crisp training data to their 
corresponding fuzzy numbers. 



neuro-fuzzy computing system, respectively. Table UJ sum- 
marizes the simulation results. Note that since the original 
training data consist of crisp numbers, in all simulations each 
input data is subjected to a fuzzification as shown in Fig. [9] 
before entering to the system. In this fuzzification process, 
the support set of the resulted fuzzy number should be chosen 
proportional to the number of available training data and the 
accuracy required for modeling the function. By increasing of 
the number of training data, smaller support set can be chosen 
to reach higher modeling accuracy. Table J] shows that, for 
example, 100 neurons are used to cover the universal set of x 
and y, and 116 neurons are used to cover the universal set of 
z, when modeling of g x is aimed. According to the domain of 
definition and the range of gi it concludes that the variables 
x, y, and z are modeled with the maximum accuracy of 0.01, 
0.01, and 0.06, respectively. 

Table [TT] summarizes the results obtained by modeling 
<?i,...,(?5 with other methods. The first group of the results 
presented in this table corresponds to the modeling of these 
functions by the ANNs trained using back-propagation and 
Projection Pursuit Learning (PPL) method (under different 
conditions and structures). The second group of the results 
presented in Table [TT] corresponds to the method proposed 



by Kwok and Yeung 11271 for training the coefficients of the 
new hidden units added to a dynamical network based on 
using different cost functions (i.e., Si, \/~3~i, S2, \fS~2, S3, 
VS3, S cascor , S ' f u jita, and S sqr as defined in ||27)). The 
third group of results in Table [TT] is obtained by using the 
method proposed by Ma and Khorasani |28l . which is similar 
to the previous method with the difference that instead of 
using different cost functions for training, different Hermite 
polynomial activation functions are applied to the neurons of 
hidden layer. Murakami and Honda ||29l studied the modeling 
ability of the Active Learning Method (ALM) and used it for 
pattern-based information processing. The ALM divides the 
input space into several partitions and then models the function 
in each of these partitions through a simple pattern. Finally, the 
last group of results in Table HI] corresponds to the modeling 
of functions using the ANFIS method |30|. 

Comparing the modeling errors of g\ in Tables UJ and [TTJ 
leads to the fact that the proposed neuro-fuzzy computing 
system is less effective than other methods for the modeling 
of approximately linear functions such as g\. The main reason 
for this problem is that the proposed system divides the space 
of input variables into several overlapping subspaces (min- 
terms) and tries to model the given function in each of these 
subspaces by a single IF-THEN rule. However, it is a well- 
known fact that by the aggregation of this kind of rules (which 
has a simple fuzzy set in their consequent part), it is very 
difficult to model a linear function. It is also concluded from 
TableUJthat in all cases a high percent of training data is stored 
in the crossbar. The reason for this problem is that since the 
number of training data is not large enough, most of the input- 
output training pairs contain a new information and tend to 
constitute a new fuzzy min-term. It will be shown later that 
this problem can be removed simply by increasing the number 
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TABLE I 

Accuracies of the models optimized using the proposed method. 



Function 


# of neurons for 
variable x 


# of neurons for 
variable y 


# of neurons for 
variable z 


Threshold value 


FVU 


# of constructed 
min-terms 


9i 


100 


100 


116 


0.2 


0.067 


77 


92 


100 


100 


69 


0.1 


0.044 


161 


93 


100 


100 


143 


0.1 


0.263 


140 


94 


100 


100 


105 


0.2 


0.087 


123 


95 


100 


100 


126 


0.15 


0.09 


130 



TABLE II 

Comparing the accuracies of the models optimized using different algorithms based on FVU criterion {29) . 



Ref. 


Model 


91 


92 


Function 

93 


94. 


95 


1261 


BPL based on Gauss-Newton method (5 hidden units) 


0.001 


0.065 


0.506 


0.080 


0.142 




BPL based on Gauss-Newton method (10 hidden units) 


0.001 


0.002 


0.183 


0.003 


0.021 




PPL supersmoother (3 hidden units) 


0.000 


0.010 


0.355 


0.021 


0.135 




PPL supersmoother (5 hidden units) 


0.000 


0.007 


0.248 


0.000 


0.028 




PPL Hermite (3 hidden units) 


0.000 


0.009 


0.075 


0.001 


0.049 




PPL Hermite (5 hidden units) 


0.000 


0.000 


0.000 


0.001 


0.015 


(27) 


CFNN with Si 


0.021 


0.029 


0.269 


0.036 


0.121 




CFNN with 


0.011 


0.028 


0.247 


0.037 


0.111 




CFNN with S 2 


0.095 


0.426 


0.547 


0.636 


0.610 




CFNN with y/Sl 


0.024 


0.031 


0.275 


0.031 


0.134 




CFNN with S 3 


0.003 


0.020 


0.306 


0.027 


0.160 




CFNN with VSs 


0.003 


0.018 


0.288 


0.030 


0.167 




CFNN With Scascor 


0.025 


0.027 


0.265 


0.031 


0.121 




CFNN with Sf ujita 


0.004 


0.047 


0.444 


0.070 


0.246 




CFNN with SV 


0.007 


0.038 


0.573 


0.185 


0.294 


dD 


Standard CFNN with sigmoidal activation functions (10 hidden units) 


0.048 


0.097 


0.551 


0.073 


0.206 




Proposed CFNN with Hermite polynomial activation functions (10 hidden 


0.031 


0.027 


0.197 


0.076 


0.095 




units) 














Standard CFNN with sigmoidal activation functions (20 hidden units) 


0.043 


0.048 


0.303 


0.050 


0.111 




Proposed CFNN with Hermite polynomial activation functions (20 hidden 


0.026 


0.019 


0.082 


0.027 


0.039 




units) 












(29) 


ALM with 6 partitions 


0.014 


0.031 


0.153 


0.057 


0.076 




ALM with 7 partitions 


0.015 


0.027 


0.132 


0.060 


0.062 




ALM with 8 partitions 


0.021 


0.032 


0.129 


0.061 


0.063 




ALM with 9 partitions 


0.027 


0.035 


0.122 


0.067 


0.064 


m 


ANFIS with 9 rules 


0.000 


0.002 


0.033 


0.008 


0.089 



of input-output training pairs. 

Here, it is worth to emphasize on three important notes. 
Firstly, although some other neuro-fuzzy systems such as 
ANFIS can outperform the proposed system, but they have 
the disadvantage that after their training the resulted fuzzy 
sets and rules conceptually have almost no meaning. Secondly, 
except the ALM, all other methods are relied on optimization 
methods, which means that they do not have biological support 
and they cannot be implemented in large scale. In addition, 
they suffer from the very high computational cost. Finally, it 
should be noted that unlike almost all other methods, in the 
proposed computing system each training data is presented to 
the system only once. 

Table [III] shows the effect of increasing the number of 
training data on the accuracy and the number of constructed 
fuzzy min-terms of the resulted system. Note that in each 
case the number of fuzzy min-terms is equal to the number 
of dissimilar training data sufficient to construct the system. 
It is concluded from Table [III] that the proposed neuro-fuzzy 



system can effectively model the functions under consideration 
by considering only some of the most important fuzzy min- 
terms (training data). Another observation is that the accu- 
racy of system can considerably be increased by increasing 
the number of input-output training data. Finally, note that 
since in all simulations parameters of the proposed neuro- 
fuzzy computing system are obtained by trial and error (and 
consequently, are not optimal), it is expected that better results 
can be achieved by using optimally selected parameters. 

TABLE III 

Accuracies of the models optimized using the proposed method 
with different numbers of training data. 



Function 


9i 




93 


95 




# of training data 


400 700 


400 


700 


400 


700 


FVU 


0.026 0.021 


0.153 


0.117 


0.058 


0.036 


# of min-terms 


194 238 


216 


245 


189 


229 



In the following we discuss on the ability of the proposed 
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neuro-fuzzy system for data classification. The reason for 
the importance of this problem is that it shows the ability 
of system for making suitable decisions in facing with new 
environmental inputs. For this purpose consider four different 
data sets (each consists of two different objects) as shown in 
Fig. [TO] In each case first we train the system shown in Fig. 
[3] with only two output terminals (one per each class) using 
supervisory method. Then we present the new data points to 
system and let it classify them. Table [IV] shows the details of 
the parameters used in simulations and the results obtained at 
each case. Figure Q~T| shows an example of how the new data 
points have been classified by the trained system. It is also 
concluded from Table IIV I that in all cases only a small fraction 
of the whole training data is sufficient to construct a neuro- 
fuzzy computing system with the ability of data classification 
with a high precision. 

Finally, we can study the performance of the proposed 
neuro-fuzzy computing system when it is trained with the 
noisy data or when a certain percent of the memristors at cross- 
points are distorted randomly. First consider the problem of 
modeling gi,...,g§ (as defined in <n~9b-<f23T>") when the system 
is trained with the noisy data. For this purpose, we add a 
Gaussian noise with zero average and variance of 0.01 to each 
of the 225 input-output training data, and then we train the 
system shown in Fig. [3] using these noisy data (recall that the 
original data are randomly selected (with uniform distribution) 
from the domain of definition of the corresponding functions). 
Table [V] shows the values obtained for FVU under the above- 
mentioned conditions in a typical simulation. These results 
show the good robustness of the proposed system in dealing 
with noisy data. Repeating the above simulation assuming that 
the training data is not noisy but 20 percent of the cross-points 
are randomly distorted beforehand and cannot be used to save 
data, we arrive at the results presented in TablelVIKa randomly 
chosen value is assigned to each distorted memristor during 
simulation). These results clearly show that the proposed 
system can fairly tolerate such a distortion. In fact, in this case 
the number of fuzzy min-terms is automatically increased to 
improve the accuracy of system. 



TABLE V 

Noise tolerance of models using FVU criterion 



Function 




.92 


53 


54 


55 


FVU 


0.281 


0.394 


0.727 


0.586 


0.61 



TABLE VI 

Fault tolerance of the proposed structure in modeling 
application using fvu criterion 



logical circuits and fuzzy logic, and ANNs and fuzzy inference 
systems. This explanation led to the notion of fuzzy min-terms 
and a special two-layer ANN with the capacity of working 
with fuzzy input-output data. This neuro-fuzzy computing sys- 
tem has at least four main advantages compared to many other 
classical designs. First, it can effectively be realized on the 
nano-scale memristor-crossbar structure. Second, the hardware 
of system can be trained simply by applying the Hebbian 
learning method and without the need to any optimization. 
Third, the proposed structure can effectively work with huge 
number of input-output training data (of fuzzy type) without 
facing with problems like overtraining. Finally, it has a strong 



biological support, which makes it a 
emulate the function of human brain. 



powerful structure to 
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TABLE IV 

Classification rate (%) for each test set 



Data set 


# of neurons 


# of neurons 


# of train- 


# 


classification 




for variable x 


for variable y 


ing data 


constructed 


rate (%) 










minterms 




1 


100 


100 


335 


52 


99.8 


2 


90 


90 


200 


45 


99.36 


3 


98 


98 


1000 


107 


99.64 


4 


94 


94 


600 


40 


95.83 




Fig. 10. Data sets used to demonstrate the ability of the proposed neuro-fuzzy system for data classification. 




Fig. 1 1 . This figure shows an example of how the new data points have been classified by the trained system. 
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